Skip to content

Document supported databricks_retry_args usage for deferrable Databricks operators#68017

Open
kosiew wants to merge 2 commits into
apache:mainfrom
kosiew:callable-deserialization-02-64609
Open

Document supported databricks_retry_args usage for deferrable Databricks operators#68017
kosiew wants to merge 2 commits into
apache:mainfrom
kosiew:callable-deserialization-02-64609

Conversation

@kosiew

@kosiew kosiew commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Follow up for #64960

Adds documentation clarifying which databricks_retry_args configurations are supported when using Databricks operators in deferrable mode.

The new documentation:

  • Explains that databricks_retry_args must be serialization-safe because it is serialized across the trigger boundary when deferrable=True.
  • Documents supported value types (plain Python primitives and collections of primitives).
  • Provides examples of supported configurations, such as {"reraise": True}.
  • Explains that retry count and delay should be configured through the dedicated retry_limit and retry_delay operator parameters.
  • Documents unsupported configurations, including Tenacity strategy objects (stop_after_attempt, wait_incrementing, etc.) and arbitrary callables.
  • Recommends using the non-deferrable Databricks operators when custom callable retry strategies are required.
  • Adds a Databricks provider changelog entry describing the documentation update.

Does this PR introduce any user-facing change?

Yes. Documentation now explicitly describes the serialization requirements and supported shapes for databricks_retry_args in deferrable Databricks operators, helping users avoid unsupported retry configurations.

How was this patch tested?

No tests were added or modified. This PR contains documentation and changelog updates only.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    ChatGPT

kosiew added 2 commits June 4, 2026 18:20
…ferrable operators

Add "Retry args in deferrable mode" subsection under
DatabricksSubmitRunDeferrableOperator and DatabricksRunNowDeferrableOperator
explaining:
- Serialization requirement: only plain Python primitives allowed across
  the trigger boundary
- Supported shapes (int/float primitives, nested plain-dict)
- Unsupported shapes (Tenacity objects, callables) with note that a
  ValueError is raised at task submission
- Recommended workaround: use non-deferrable mode for custom retry strategies

Also update changelog for 7.16.0.
@kosiew kosiew force-pushed the callable-deserialization-02-64609 branch from 8b7caa8 to 283df3e Compare June 4, 2026 14:14
@eladkal

eladkal commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

cc @moomindani for review

@moomindani moomindani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up — this is exactly the docs fix scoped at the end of the #64960 thread, and the structure of the two new sections is clean (heading levels, anchors, and code blocks all check out). However, two of the content claims do not match what the merged code actually does, and two files should not be in the diff:

  1. The "Supported (serialization-safe and runtime-valid)" example {"reraise": True} is not runtime-valid: BaseDatabricksHook replaces (does not merge) the default retry_args, so stop/wait are lost and tenacity falls back to stop_never + wait_none() — an infinite, zero-backoff retry loop. Setting databricks_retry_args at all also makes retry_limit/retry_delay no-ops. Since tenacity strategy objects (the only way to set stop/wait inside retry_args) are exactly what deferrable mode rejects, the honest guidance is "do not use databricks_retry_args in deferrable mode; use retry_limit/retry_delay", rather than a supported example.
  2. "will raise ValueError at task submission" overstates the timing: validation runs in the trigger constructors at defer time, after submit_run/run_now has already launched the Databricks run. (The #64960 thread did say "before any Databricks API call", but re-checking the merged code, the operator path submits first and defers second — operators/databricks.py:834-836.)
  3. generated/provider_dependencies.json.sha256sum is a stale-base rebase artifact — please rebase onto current main and drop it.
  4. The hand-added changelog entry should be removed (release-manager-maintained file; see inline).

Details inline.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

@@ -1 +1 @@
2d6f34bb40832f84cb6c121237b1c5b0a05181dccface9fd171558f4df1747dc
2ccde55d75b93c7fc2c5723fc7f74bf8995244606190c98acf005ea1f39f04ca

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was intentionally deleted and gitignored on main by #68801 (it is auto-regenerated by breeze when needed), so this PR re-introduces a file that no longer exists upstream — and the hash edit itself is byte-for-byte identical to the already-merged #68011. It looks like the branch was cut in the window between #67080 (which mistakenly re-added the file) and #68011. Could you rebase onto current main and drop this file from the diff? This is the same stale-base issue flagged during the #64960 review round.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting


When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the
trigger boundary and must contain only Airflow-serializable values (plain Python primitives
such as ``int``, ``float``, ``str``, ``bool``, ``None``, ``dict``, and ``list``).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two precision issues here:

  1. The boundary is Airflow serde, not plain primitives. Databricks: Fail fast for non-serializable retry_args in deferrable operators and triggers #64960 deliberately switched the validator from json.dumps back to airflow.sdk.serde.serialize precisely because serde accepts more than primitives — the merged tests cover a datetime value as supported. Documenting "plain Python primitives" re-narrows what Databricks: Fail fast for non-serializable retry_args in deferrable operators and triggers #64960 resolved to keep broad; "Airflow-serde-serializable values" (with primitives as the common case) would be accurate.
  2. Serializability is necessary but not sufficient: the validator checks values only, not whether the keys are valid tenacity.Retrying kwargs. For example {"stop": 3} passes validation, then fails at runtime with TypeError: 'int' object is not callable when tenacity invokes stop(retry_state). Worth stating that passing validation does not mean the retry config is meaningful.

Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

.. code-block:: python

# Only plain-primitive Retrying kwarg: reraise
databricks_retry_args = {"reraise": True}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is serialization-safe but not runtime-valid as documented. When retry_args is provided, BaseDatabricksHook.__init__ replaces the defaults rather than merging (hooks/databricks_base.py:151-161 — only retry and after are re-injected), so stop and wait are gone and tenacity falls back to stop_never + wait_none(). On a persistently retryable API error this retries forever with zero backoff. It also contradicts the paragraph below: once databricks_retry_args is set, retry_limit/retry_delay are ignored entirely (they are only consulted in the else branch), so the two cannot be combined. Given deferrable mode rejects the tenacity objects that would restore stop/wait, I would replace the "Supported" example with explicit guidance: in deferrable mode, leave databricks_retry_args unset and use retry_limit/retry_delay.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

``wait``, ``retry``, ``before``, ``after``, etc.) require tenacity
callable objects, which are not serialization-safe in deferrable mode.

**Not supported** in deferrable mode (will raise ``ValueError`` at task submission):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"At task submission" overstates the timing. The validation added by #64960 lives only in the trigger constructors (triggers/databricks.py:61,160), and the operator defers after submitting: submit_run/run_now runs first, then _handle_deferrable_databricks_operator_execution builds the trigger (operators/databricks.py:834-836, 1183-1185). So with invalid retry_args the Databricks run is already launched, and the task fails at defer time, leaving that run in flight. (And if the run is already terminal on the first poll, the trigger is never built and no error is raised at all.) Suggest: "will raise ValueError when the task defers — note the Databricks run has already been submitted at that point".


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

databricks_retry_args = {"retry": my_custom_retry_callable}

If you need a custom callable retry strategy, use the non-deferrable
:class:`~airflow.providers.databricks.operators.DatabricksRunNowOperator` (``deferrable=False``).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cross-reference path is missing the module segment: the class lives at airflow.providers.databricks.operators.databricks.DatabricksRunNowOperator, and operators/__init__.py has no re-exports, so this xref will not resolve. Sibling pages (sql_statements.rst, notebook.rst, task.rst) use the full path. The same short form already exists in the pre-existing intro lines of these two pages — since this PR touches both files, it would be good to fix those occurrences too.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

Retry args in deferrable mode
-----------------------------

When ``deferrable=True``, the ``databricks_retry_args`` dictionary is serialized across the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as the equivalent section in run_now.rst apply to this copy: the {"reraise": True} example, the "plain primitives" framing, the "at task submission" timing, and the :class: path.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

~~~~~~~~

* ``Fail fast for non-serializable retry_args in deferrable operators and triggers (#64960)``
* ``Document supported retry_args shapes for deferrable Databricks operators``

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please drop this entry. Per the NOTE TO CONTRIBUTORS at the top of this file, the changelog is maintained semi-automatically by the release manager, and contributor edits are only for breaking-change guidance. A hand-added line (without the (#NNNNN) suffix, and under "Features" for a doc-only change) will conflict with the generated entries at release time — the commit message alone is sufficient.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

@moomindani

Copy link
Copy Markdown
Contributor

Small correction to one point in my review: generated/provider_dependencies.json.sha256sum was re-added to main by #68775 a few hours before I posted (the same accidental re-add pattern as #67080, after #68801 had removed it), so "a file that no longer exists upstream" is no longer accurate. The requested action is unchanged: rebase onto current main and keep this file out of this PR's diff.


Drafted-by: Claude Code (Fable 5); reviewed by @moomindani before posting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants